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Abstract — We consider the problem of deciding whether a 
highly incomplete signal lies within a given subspace. This 
problem, Matched Subspace Detection, is a classical, well- 
studied problem when the signal is completely observed. High- 
dimensional testing problems in which it may be prohibitive or 
impossible to obtain a complete observation motivate this work. 
The signal is represented as a vector in R", but we only observe 
m <§:. not its elements. We show that reliable detection is possible, 
under mild incoherence conditions, as long as m is slightly greater 
than the dimension of the subspace in question. 

I. Introduction 

Testing whether a signal lies within a subspace is a problem 
arising in a wide range of applications including medical [1] 
and hyperspectral [4] imaging, communications [5], radar [7], 
and anomaly detection [11]. The classical formulation of this 
problem is a binary hypothesis test of the following form. Let 
V £ M" denote a signal and let x — v + w, where w is a 
noise of known distribution. We are given a subspace S C M" 
and we wish to decide if w S S' or not, based on x. Tests 
are usually based on some measure of the energy of x in 
the subspace S, and these 'matched subspace detectors' enjoy 
optimal properties [9], [10]. 

This paper considers a variation on this classical prob- 
lem, motivated by high-dimensional applications where it 
is prohibitive or impossible to measure v completely. We 
assume that only a small subset fl C of the 

elements of v are observed (with or without noise), and 
based on these observations we want to test whether v G S. 
For example, consider monitoring a large networked system 
such as a portion of the Internet. Measurement nodes in the 
network may have software that collects measurements such 
as upload and download rate, number of packets, or type of 
traffic given by the packet headers. In order to monitor the 
network these measurements will be collected in a central 
place for compilation, modeling and analysis. The effective 
dimension of the state of such systems is often much lower 
than the extrinsic dimension of the network itself. Subspace 
detection, therefore, can be a useful tool for detecting changes 
or anomalies. The challenge is that it may be impossible to 
obtain every measurement from every point in the network 
due to resource constraints, node outages, etc. 

The main result of this paper answers the following ques- 
tion. Given a subspace S of dimension r ^ n, how many 
elements of v must be observed so that we can reliably decide 
if it belongs to S7 The answer is that, under some mild 
incoherence conditions, the number is 0(r log r). This means 



that reliable matched subspace detectors can be constructed 
from very few measurements, making them scalable and 
applicable to large-scale testing problems. 

The main focus of this paper is an estimator of the energy 
of u in based on only observing the elements {wi}igo- Sec- 
tion II proposes the estimator Section III presents a theorem 
giving quantitative bounds on the estimator's performance and 
the proof using three lemmas that are proved in the Appendix. 
Section IV presents numerical experiments. Section V applies 
the main result to the subspace detection problem, both with 
and without noise. 

II. Energy Estimation from Incomplete Data 

Let vn be the vector of dimension |r2| x 1 comprised of the 
elements Vi, i E fl, ordered lexigraphically; here \n\ denotes 
the cardinality of fi. The energy of v in the subspace S is 
ll^'s^lli' where Ps denotes the projection operator onto S. 
There are two natural estimators of HPs^^Hl based on va- The 
first is simply to form the n x 1 vector v with elements Vi if 
i G ^ and zero if i ^ ft, for i — 1, . . . , n. This 'zero-filled' 
vector yields the simple estimator ||P5-!J||2. Filling missing 
elements with zero is a fairly common, albeit naive, approach 
to dealing with missing data. Unfortunately, the estimator 
IIP5WII2 is fundamentally flawed. Even if v E S, the zero-filled 
vector V does not necessarily lie in 5*. 

A better estimator can be constructed as follows. Let U 
be an 71 X r matrix whose columns span the r-dimensional 
subspace S. Note that for any such U, Ps ^ U{U'^U)-^U'^ . 
With this representation in mind, let Un denote the x r 
matrix, whose rows are the rows of U indexed by the set 
ri, arranged in lexigraphic order Since we only observe v on 
the set n, another approach to estimating its energy in S is to 
assess how weU vn can be represented in terms of the rows of 
Uq. Define the projection operator Ps^^ := UniU'^UnfU'^, 
where ^ denotes the pseudoinverse. It follows immediately that 
if 1; e S',Jhen ||z; - Psv\\l = and \\vn - PsnVnWl = 0, 
whereas P5WII2 can be significantly greater than zero. This 
property makes ||PsnWsi||2 a much better candidate estimator 
than ll-Ps^lli- However, if \n\ < r, then it it is possible that 
||'ya~-fsn^^o|l2 — 0' ^^^^ if ll^^^^s'^^lll > 0- Our main result 
shows that if |f2| is just slightly greater than r, then with high 
probability \\vq, - Psn'i'olll is very close to - Ps^lli- 

III. Main Theorem 

Let us now focus on our main goal of detecting from a very 
small number of samples whether there is energy in a vector 



V outside the r-dimensional subspace S. In order to do so, we 
must first quantify how much information we can expect each 
sample to provide. The authors in [2] defined the coherence 
of a subspace S to be the quantity 

:= — max ||P5e,- II2 ■ 
r J 

That is, /i(S') measures the maximum magnitude attainable 
by projecting a standard basis element onto S. Note that 
1 < il(S) < -. The minimum /i(S') = 1 can be attained by 
looking at the span of any r columns of the discrete Fourier 
transform. Any subspace that contains a standard basis element 
will maximize fJ,{S). For a vector z, we let /i(z) denote the 
coherence of the subspace spanned by z. By plugging in the 
definition, we have 

n\\z\\[ 



Lemma 3. With the same notations as Theorem 1, 
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To state our main theorem, write v — x + y where x e 
S and y £ S*^. Let the entries of v be sampled uniformly 
with replacement. Again let fl refer to the set of indices for 
observations of entries in v, and denote = m. Given these 
conventions, we have the following. 

Theorem 1. Let S > and m > |r/i(S') log (^)- Then with 
probability at least 1 — 46, 



m(l — a) — rfi{S)j^ 



-1) 



\\v-Psv\\l < \\vn~PsnVn\\] 



and 



where a 



\vn-PsnVa\\i < il + a)-\\v-Psv\\ 

n 



log(i), P = j2fi{y)\og{l), andj^ 



Proof: In order to prove the theorem, we split the quantity 
of interest into three terms and bound each with high proba- 
bility. Consider \\vn - Pg^vnWl = Wvn - PsnynWj- Let the 
r columns of U be an orthonormal basis for the subspace S. 
We want to show that 

\\yn - PsaVnlll = Ilyfill2 - VnUn [U^UnY^ U^yn (D 

is near ^||y||2 with high probability. To proceed, we need the 
following three Lemmas whose proofs can be found in the 
Appendix. 

Lemma 1. With the same notations as Theorem 1, 

(l-a)-bll^<||ydl^<(i + ")-bll^ 

n n 
with probability at least 1 — 26. 

Lemma 2. With the same notations as Theorem 1, 

\\uEyn\\l<W + ir'^'^\\y\\l 



n n 



with probability at least 1 — 5. 



II {u^M 



< 



(1 — 7)771 

with probability at least 1 — 6, provided that 7 < 1. 

To apply these three Lemmas, write the second term of 
Equation (1) as 

, -1 



y^Un{uEUn) U^yn ^ WWnU^ynl 



By Lemma 3, UqUq is 



where W^Wn = (U^Un) . ^^,,,,,,a ^, L.f^< 
invertible under the assumptions of our theorem, and hence 
is well-defined and has spectral norm bounded by the 
square root of the inverse of the smallest eigenvalue of Uq^U^- 
That is, we have 



\WnUl^yn\\l 



< 



\Wn\\l\\UEyn\\l 
\W^,Wnh\\uEyn\\l 



hWuEyn 



l^ 



II ^ II2 is bounded by Lemma 3 and ||C/j^yn||2 is 

bounded by Lemma 2. Putting these two bounds together with 
the bounds in Lemma 1 and using the union bound, we have 
that with probability at least 1 — 46 



(1 



-\\y\\l>\\yn\\l-\\{uEUny'h\\U^yn\ 



2 (;3 + l)V(5)„ ,,2 

y\\2 



(1-7)77 



giving us our bound. 



IV. Discussion and Numerical Experiments 

In this section we wish to give some intuition for the lower 
bound in Theorem 1 and show simulations of the estimate 
ll^n — PsnVn\\2- If the parameters a, 7 are very near 0, our 
lower bound is approximately equal to 

m-rfi{S) 

\\v - Psvh 

n 

For an incoherent subspace, the parameter fJ.{S) — 1. In this 
case, for m < r the bound is < 0, which is consistent with the 
fact that \\vq — PsitVn\\2 — always for m < r. Once m > r+ 
1, linear algebraic reasoning tells us that ||uo — PsnVn\\2 will 
be strictly positive with positive probability; Theorem 1 goes 
further to say the norm is strictly positive with high probability 
once 771 ^ 0{rlogr). 



The parameters a,/?, 7 all depend on ylog(|); these 
parameters grow as 6 gets very small. Increasing the number 
of observations m will counteract this behavior for a and 7, 
but this does not hold for /?. In fact, even if the vector y is 
incoherent and fj,{y) — 1, its minimum value, then j3 = 2 for 
5 K, .135. To get /3 very near zero, 5 must be very near one, 
but this is not a useful regime. 

We can see, however, that in simulations these large con- 
stants are somewhat irrelevant; The large deviations analysis 
needed for the proof is overly conservative in most cases. 



Projection Residual, Incolierent Projection Residual, Colierent 




(a) Incoherent subspace (random (b) Coherent subspace. fJ.{S) 
Gaussian basis). fi{S) fa 1.5, 4.1, fi{y) 47.0. 
IJ.{y) ^ 13.6. 



Fig. 1: These plots show the projection residual \\vfi ~ 
PsjjUnllj over 100 simulations. Each of the simulations has a 
fixed subspace, vector v E 5^ and sample size m, but different 
sample set ft drawn without replacement. The problem size is 
n ^ 10000, r ^ 50. 



Zero-Filled Projection, v in S 
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Fig. 2: Simulation results for the zero-filling approach, v E S, 
II w II 2 = 1. The basis used is a random Gaussian basis, r — 50, 
n 10000, fj,{S) « 1.5, fi{y) « 17.9. Note that the zero-filled 
residuals can be made arbitrarily large by increasing ||w|||. 

This plays out in the simulations shown in Figure 1, where 
we see that for very incoherent subspaces, ||wo ~ PssiVn\\2 
is always positive for m > rfi{S)\ogr. The plots show the 
minimum, maximum and mean value of Hwrz — Psn''^^\\2 over 
100 simulations, for fixed S and fixed v such that ||t;||2 = 1 
and V E S-^. For each value of the sample size m, we 
sampled 100 different instances of ilwithout replacement, 
giving us a realistic idea of how much energy of v is captured 
by TO samples. Our simulations for the Fourier basis and a 
basis made of orthogonalized Gaussian random vectors always 
showed the estimate to be positive for to > rfi{S)\ogr, 
even for the worst-case simulation run. For more coherent 
subspaces, we often (but not always) see that the norm is 
positive as long as to > r/i(S') log r. 

V. Matched Subspace Detection 

We have the following detection set up. Our hypotheses are 
Ho : V E S and Hi : v ^ S and the test statistic we will use 
is 

Hi 

tivn) = \\vn - PsnVnh ^ V 

Ho 

In the noiseless case, we can let 77 = 0; our result in 
Theorem 1 shows for S > 0, the probability of detection is 
Pd ~ P [t{vn) > 0\Hi] > 1—4(5 as long as to is large enough, 
and we also have that the probability of false alarm is zero. 



PpA — P W^n) > Ol^o] = since the projection error will 
be zero when v E S. 

When we introduce noise we have the same hypotheses, but 
we compute the statistic on vq — vu+w where w ^ A/'(0, 1) 
is Gaussian white noise: 

~ 2 ^1 
t{vn) = \\vn - Psn^nh ^ ^^a 

Ho 

We choose rj\ to fix the probability of false alarm: 

V[tivn)>Vx\no] <X = Pfa 

Then we have from [9] that t{vfi) is distributed as a non- 
central with r degrees of freedom and non-centrality 
parameter ||i;o — PssiVn\\2, and that fj^ is monotonically 
increasing with the non-centrality parameter Putting this to- 
gether with Theorem 1 we see that as m grows, Ijwo— Pss^wnlli 
grows and thus the probability of detection grows. 

We now show why the heuristic approach of zero-filling 
the incomplete vector vq does not work. As we described in 
Section 11, the zero-filling approach is to fill the vector v with 
zeros and then project onto the full subspace S. We denote 
the zero-filled vector as vq and then calculate the projection 
energy only on the observed entries: 

Hi 

to{vn) = \\vn - {PsVo)n II2 ^ V 

Ho 

Simple algebraic consideration reveals that io('^si)|^o is pos- 
itive. In fact, even in the absence of noise, the probability 
of false alarm can be arbitrarily large as ||w||2 increases. The 
value of to{vQ)\Ho, based on noiseless observations, is plotted 
as a function of the number of measurements in Figure 2. 

We note that for unknown noise power or structured inter- 
ference, these results can be extended using the GLRT [10]. 

VI. Conclusion 

We have shown that it is possible to detect whether a highly 
incomplete vector has energy outside a subspace. This is a 
fundamental result to add to a burgeoning collection of results 
for incomplete data analysis given a low-rank assumption. 
Missing data are the norm and not the exception in any massive 
data collection system, so this result has implications on many 
other areas of study. 

One of our reviewers shared an insight that the process by 
which we observe some components and observe erasures in 
other components can be expressed as a projection operator 
It may be possible to extend the results of Theorem 1 to a 
wide class of models of random projection operators beyond 
the class of deletion operators studied here. 
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Appendix 

We will need the following two large deviation bounds in 
the proofs of our Lemmas below. 

Theorem 2 (McDiarmid's Inequality [6]). Let Xi, . . . , Xn be 

independent random variables, and assume f is a function for 
which there exist ti, i — 1, . . . ,n satisfying 



sup \f{xi, 



where Xi indicates replacing the sample value Xi with any 
other of its possible values. Call f{Xi, . . . ,X„) := Y. Then 
for any e > 0, 



' [y > E [y] + e] < exp 
' [y < E [y] - e] < cxp 



(2) 



(3) 



Theorem 3 (Noncommutative Bernstein Inequality 
[3], [8] ). Let Xi,...,X„i be independent zero- 
mean square r x r random matrices. Suppose 

pI = max{\\E[XkXl]\\2,mxl Xk\h} and \\Xkh < M 
almost surely for all k. Then for any r > 0, 



k=l 



> T 



< 2r exp 



-tV2 



We now proceed with the proof of our three central Lem- 
mas. 

Proof of Lemma 1: To prove this we use McDiarmid's 
inequality from Theorem 2 for the function f{Xi, . . . , Xm) = 
S"=i -^i- The resulting inequality is more commonly referred 
to as Hoeffding's inequality. 

We begin with the first inequality. Set Xi — y'^f^^y We seek 
a good value for ti. Since y^^j-) < \\y\\lo for '^^ have 



J2x,-j2x,-x, 

i—1 i^k 



Xh — Xi 



<2||y|| 



We calculate E ^i] follows. Define I{} to be the 

indicator function, and assume that the samples are taken 
uniformly with replacement. 



E 



= E 



E^f2(» 



E^ 

1=1 



\v\\l 



Plugging into Equation (3), the left hand side is 



E^^ 



E^^ 



and letting e = "^llylll' we then have that this probability is 
bounded by 



exp 



4 

I oo 



4to||?;| 

Thus, the resulting probability bound is 



\\yn\\l>{l-a)-\\y 
n 



> 1 — cxp 



-a'^m\\y\\^ 



(4) 



2n2|l2/||i 

Substituting our definitions of and a shows that the lower 
bound holds with probability at least 1 — S. The argument 
for the upper bound is identical after replacing Equation (2) 
instead of (3). The Lemma now follows by applying the union 
bound. ■ 
Proof of Lemma 2: We use McDiarmid's inequality in 
a very similar fashion to the proof of Lemma 1. Let Xi — 
yn{i)Un{i)7 where Vt{i) refers to the i*'* sample index. Thus 
yQ,{i) is a scalar, and the notation U^^i) refers to an r x 1 
vector representing the transpose of the Q.{if^ row of U . 

Let our function = WYJLi^^h = 

\\U'^yn\\2- To find the ti of the theorem we first need to 
bound \\Xi\\ for all i. Observe that ||L/o(i)||2 = ||C/"^ei||2 = 
II -PsGj II 2 < ^r^(5)/n by assumption. Thus, 



||-^j||2 < \yn(i)\\\Un(i)h < \\y\\oo\/r^i{S)ln 
Then observe f{Xi, X^) - f{Xi, ...,Xk 



7 -^m , 



E^» 



E + ^* 



< 



— Xk 



< \\Xk\\^ + \\Xkh 

< 2II2/II, 



MS) 



Here, the first two inequalities follow from the triangle in- 
equality. Next we calculate a bound for E [f{Xi, . . . , X,n)] — 
E[\\J2^iXi\\]. Assume again that the samples are taken 
uniformly with replacement. We have 



J2uf,^\\Pse,r<l-f,{S) 



k=l 

from which we can see that 



E 



E^^ 



E 



\ttT 



E^ 

k=l 



E E ^jkyp{m=i} 



i=i ]=i 



< 



k=i \j=i I 

■mrji{S) 2 

\\y\\2 



(5) 



(6) 



The step (5) follows because the cross terms cancel by 
orthogonality. The step (6) is because of our assumption that 
sampling is uniform with replacement. 

Since E [||X||2] < E [||X||2]^^^ by Jensen's inequality, we 

have that E[||E"i^»|l2] < 



2/II2. Letting e 

P \J '-^^\\y\\2 and plugging into Equation (2), we then 
have that the probability is bounded by 



exp 



Am\\y\ 



12 rt^jS) 



Thus, the resulting probability bound is 



> (1 



< exp 



Substituting our definitions of ^(y) and /3 shows that the lower 
bound holds with probability at least 1 — 5, completing the 
proof ■ 
Proof of Lemma 3: We use the Noncommutative Bern- 
stein Inequality as follows. Let Xk = C^n(fc)t^n(fc) — ^-^r, where 
the notation C/o(*:) is as before, i.e. is the transpose of the 
Q{ky'^ row of U, and Ir is the r x r identity matrix. Note 
that this random variable is zero mean. 

We must compute and AI. Since 0(fc) is chosen uni- 
formly with replacement, the Xk are identically distributed, 
and p does not depend on k. For ease of notation we will 
denote Un(k) as Uk- 

Using the fact that for positive semi-definite matrices, 
11^ - ^Ib < inax{||A||2, |li?|l2}, and recalling again that 
12 



\U4l = WU^etWl = WPsekWl < rp{S)/n, we have 



UkUl 



1 



< max 



rp{S) 



and we let M :— rp{S)/n 
For p, we note 

\\¥.[XkXl]\\^ = 



E [xlXk] 
E 



E 



< 



E [UkU^UkU^] 



< max 



\ n 

]rp{S) 
max <^ ^ 

MS) 



E[UkU^]\\ 



Now we can apply the Noncommutative Bernstein Inequal- 
ity, Theorem 3. First we restrict r to be such that Mr < rap^ 
to simplify the denominator of the exponent. Then we get that 



2r exp 



mp^ 



Mr/'d 



< 2rexp 



V2 



4 IMS) 
■3^ n2 



and thus 



^-^ \ n 

ken ^ 



> T 



< 2rexp 



\8mrp{S) ) 



Now take r = ^m/n with 7 defined in the statement of 
Theorem L Since 7 < 1 by assumption, Mr < mp^ holds 
and we have 



E 

ken 



T 1 

n 



< 



> 1 



We note that ||Efeeo ^fc^^J ~ ^^''L - implies that the 
minimum singular value of J^ken ^kU]: is at least (1 — 7)^. 
This in turn implies that 



E u^^k 



\ken / 
which completes the proof. 



< 



(1 — 7)771 
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Thus we let p^ := rp{S)/'n?. 



